System and method for context-rich database optimized for processing of concepts

ABSTRACT

A computer-implemented method and system of disciplining data input to a multi-contextual and multi-dimensional database is described. A template identifying subject matter specific settings is provided, where the settings comprise a default ontology or taxonomy, a reserved word or phrase list, and restricted value ranges, and where the template includes a template identification and a version. A new record is generated based on the template. Alternatively, a record is generated by copying or inheriting information from an existing record in the database. A unique identification code is assigned to the record including a reference to the template identification and version. A weight is assigned to each record representing a level of diligence used at a time of data entry, where the level of diligence is dependent in part on the default ontology or taxonomy and the reserved word or phrase list.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.13/437,362, filed Apr. 2, 2012, which is a continuation of U.S.application Ser. No. 11/625,761, filed Jan. 22, 2007, titled “SYSTEM ANDMETHOD FOR CONTEXT-RICH DATABASE OPTIMIZED FOR PROCESSING OF CONCEPTS,”and issued as U.S. Pat. No. 8,150,857, which claims the benefit of U.S.Provisional Application No. 60/760,751, filed Jan. 20, 2006 and titled“SYSTEM AND METHOD FOR STANDARDIZING THE DESCRIPTION OF INFORMATION,”and of U.S. Provisional Application No. 60/760,729, filed Jan. 20, 2006and titled “SYSTEM AND METHOD FOR INFORMATION RETRIEVAL,” all of whichare hereby incorporated by reference.

This application is related to U.S. application Ser. No. 11/656,885,filed Jan. 22, 2007, and issued as U.S. Pat. No. 7,941,433 on May 10,2011 and titled “SYSTEM AND METHOD FOR MANAGING CONTEXT-RICH DATABASE,”and to U.S. application Ser. No. 13/103,875, filed May 9, 2011 andtitled “SYSTEM AND METHOD FOR MANAGING CONTEXT-RICH DATABASE”, all ofwhich are hereby incorporated by reference.

BACKGROUND Field of the Invention

The invention relates to databases, and in particular, to amulti-contextual, multi-dimensional database optimized for processing ofconcepts.

Description of the Related Technology

Conventional databases are typically designed with a single purpose inmind, within a closed system. There is a growing interest in themarketplace to share data in order to have multiple systems intemperateand, in some cases, benefit from a larger body of experience. FIG. 1Aillustrates the limited record/field structure of a conventionaldatabase record and the limited number of associations possible fromrecords lacking contextual robustness and depth.

Much research has been done over the years to solve the challenge of auniform representation of human knowledge. Solutions ranging from fixedtaxonomies and ontologies to the more recent specification for theSemantic Web have made noble attempts at overcoming these challenges,but important gaps remain. Persistent problems can be summarized asfollows:

-   -   1. Knowledge is created, recorded, transmitted, interpreted and        classified by different people in different languages with        different biases and using different methodologies thus        resulting in inaccuracies and inconsistencies.    -   2. The vast majority of knowledge is expressed in free-form        prose language convenient for human interpretation but lacking        the structure and consistency needed for accurate machine        processing. Current knowledge tends to be represented in a        one-dimensional expression where the author makes a number of        assumptions about the interpreter.    -   3. There is no international standard or guideline for        expressing the objects and ideas in our world and therefore no        way to reconcile the myriad ways a single idea may be        represented.    -   4. As a result of the foregoing, the vast majority of        information/knowledge in existence today is either inaccurate,        incomplete or both. As a result, true industry-wide or global        collaboration on projects ranging from drug discovery to        homeland security is effectively prevented.    -   5. There are several reasons for this shortcoming: a) very few        people have the training of an Information Scientist capable of        capturing the multidimensional complexity of knowledge; and b)        even with such training, the process has been extremely onerous        and slow using current methods.    -   6. Compounding on these challenges is the fact that both new        knowledge creation and the velocity of change is increasing        exponentially.    -   7. Even though an abundance of sophisticated database technology        is available today, improved data mining and analysis is        impossible until the quality and integrity of data can be        resolved.

SUMMARY OF CERTAIN INVENTIVE ASPECTS

In one embodiment, there is a computer-implemented method ofdisciplining data input to a multi-contextual and multi-dimensionaldatabase, the method comprising providing a template identifying subjectmatter specific settings, wherein the settings comprise a defaultontology or taxonomy, and wherein the template includes a templateidentification and a version; generating a new record based on thetemplate; and assigning a unique identification code to the recordincluding a reference to the template identification and version.

The method may additionally comprise generating a record by copying orinheriting information from an existing record in the database. Themethod may additionally comprise assigning a weight to each recordrepresenting a level of diligence used at a time of data entry, whereinthe level of diligence is dependent in part on the default ontology ortaxonomy and the reserved word or phrase list. The settings mayadditionally comprise a reserved word or phrase list, and restrictedvalue ranges. The template may include one or more fields foridentification of an object or phenomenon, subject classification,physical measures, observables, economic variables, social, political,and environmental properties. The field information may include areference to source identity, location, file format and size. Thesubject classification may include higher, lower and equivalentclassifications, and may include a reference to a source. Each field ofthe template may include one or more attributes for internationalequivalents, source, author, date, time, location, templateidentification, value list, security level, encryption and version. Aphysical embodiment of an object or phenomenon may be represented in themulti-contextual and multi-dimensional database by at least one of aself-contained data object, a web page, and a record in a relationaldatabase. The template may be subject specific. The method mayadditionally comprise scoring the record on a relative scale based onthe quality of a source of the information in the record and a gatheringsystem of the information.

In another embodiment, there is a computer-implemented system fordisciplining data input to a multi-contextual and multi-dimensionaldatabase, the system comprising means for providing a templateidentifying subject matter specific settings, wherein the settingscomprise a default ontology or taxonomy, and wherein the templateincludes a template identification and a version; means for generating anew record based on the template; and means for assigning a uniqueidentification code to the record including a reference to the templateidentification and version.

The system may additionally comprise means for generating a record bycopying or inheriting information from an existing record in thedatabase. The system may additionally comprise means for assigning aweight to each record representing a level of diligence used at a timeof data entry, wherein the level of diligence is dependent in part onthe default ontology or taxonomy and the reserved word or phrase list.The settings may additionally comprise a reserved word or phrase list,and restricted value ranges. The template may include one or more fieldsfor identification of an object or phenomenon, subject classification,physical measures, observables, economic variables, social, political,and environmental properties. The field information may include areference to source identity, location, file format and size. Thesubject classification may include higher, lower and equivalentclassifications, and may include a reference to a source. Each field ofthe template may include one or more attributes for internationalequivalents, source, author, date, time, location, templateidentification, value list, security level, encryption and version. Aphysical embodiment of an object or phenomenon is represented in themulti-contextual and multi-dimensional database by at least one of aself-contained data object, a web page, and a record in a relationaldatabase. The template may be subject specific. The system mayadditionally comprise means for scoring, the record on a relative scalebased on the quality of a source of the information in the record and agathering system of the information.

In yet another embodiment, there is a system for disciplining data inputto a multi-contextual and multi-dimensional database, the systemcomprising a template identifying subject matter specific settings,wherein the settings comprise a default ontology or taxonomy, andwherein the template includes a template identification and a version; amemory storing the template; and a processor connected to the memory andconfigured to generate a new record based on the template, and assign aunique identification code to the record including a reference to thetemplate identification and version.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a diagram showing a conventional database record which isinherently designed for a very narrow purpose and therefore is limitedin its ability to provide value in data mining.

FIG. 1B is a diagram of a contextual data cluster or repository whichcontains broad contextual information.

FIG. 2 is a block diagram illustrating the neutral nature of thecontext-rich data repository where interoperability and data sharingtake precedence over form, and highlighting that information can morphinto different forms depending upon the task at hand.

FIG. 3 is a diagram illustrating the contextual complexity ofinformation on a typical medicine vial having references to chemicals,product codes, manufacturers, retailers, usage instructions, safetynotices, laws, visual elements, etc.

FIG. 4 is a block diagram of an embodiment of an example context-richdata repository system having an input management subsystem and a querymanagement subsystem.

FIG. 5 is a block diagram of an embodiment of system software modulesand data resources used by the system shown in FIG. 4.

FIG. 6 is a diagram of an embodiment of system software modulesoperating on the context-rich data repository shown in FIG. 4.

FIG. 7 is an example illustrating the process of mapping unstructureddata into a context-rich data structure.

FIG. 8 is an example of the subject-specific template.

FIG. 9 is an example illustrating the process of calculating aqualitative score for each entity based on all records associated withthat entity.

FIG. 10 is an example illustrating the process of calculating aqualitative score for each record based on all parameters included inthat record.

FIG. 11 is an example illustrating the process of scoring each entrywithin a record.

FIG. 12 is an example illustrating a table which may be used in theprocess of mapping unstructured data into a context-rich data structure.

FIG. 13 is an example of a user query message.

FIG. 14 is an example illustrating the process of retrieving data basedupon user-designated quality parameters.

FIG. 15 is a flowchart of one embodiment of a method of formattingunstructured data.

FIG. 16 is a flowchart of an embodiment of the structured data entrymodule shown in FIG. 6.

FIG. 17 is a flowchart of one embodiment of a method of linking a dataobject to one or more templates.

FIG. 18 is a flowchart of one embodiment of a method of evaluating theduality of a data object.

FIG. 19 is a flowchart of one embodiment of a method of addingcontextual data to a data object.

FIG. 20 is a flowchart of one embodiment of a method of searching adatabase based on a user quest.

FIG. 21 is a flowchart of one embodiment of a method of generating querysyntax based on a user query message.

FIG. 22 is a flowchart of one embodiment of a method of interacting witha user to narrow a search result.

DETAILED DESCRIPTION OF CERTAIN INVENTIVE EMBODIMENTS

The following detailed description of certain embodiments presentsvarious descriptions of specific embodiments of the invention. However,the invention can be embodied in a multitude of different ways asdefined and covered by the claims. In this description, reference ismade to the drawings wherein like parts are designated with likenumerals throughout.

The terminology used in the description presented herein is not intendedto be interpreted in any limited or restrictive manner, simply becauseit is being utilized in conjunction with a detailed description ofcertain specific embodiments of the invention. Furthermore, embodimentsof the invention may include several novel features, no single one ofwhich is solely responsible for its desirable attributes or which isessential to practicing the inventions herein described.

This system builds on the premise that mankind will never agree on asingle definition of anything. Broad collaboration will always result ina multitude of data sets describing the same experiment or object—eachclaiming to be the definitive representation. Likewise, there willalways be a multitude of definitions or classifications each vying to bethe authority. The prevailing approach to database design or data miningtoday is to restrict input data sources as a means of overcoming thisproblem. As a result, not only do designers unwittingly introduce a biasthat affects all system results, but they effectively ignore theinherent volatility of information: it is in a constant state of update,addition and revision.

In contrast, the system and method described herein embraces theaforementioned social and cultural aberrations by describing a systemthat performs the equivalent of an Esperanto language, optimized formachine rather than human processing of complex, imperfect data sets.Further, the system provides a logical mechanism for storing a multitudeof related versions of a concept and the means for resolving disparitiesin a common fashion, thus constricting ambiguity to a minimum. Thepotential benefits from this invention are immeasurable. Traditionally,researchers have been largely relegated to data mining informationgenerated within their own limited domain without the benefit ofunderstanding the related successes and failures of other comparableinternational efforts. This system provides a foundation for thecreation and maintenance of a single body of human knowledge andexperience—thus increasing by several orders of magnitude, the velocityand efficiency of innovation and learning.

The system is focused on solving the Achilles heel of machine analysisof knowledge—the creation of a universal knowledge repository, such as amultidimensional, context-rich database or data structure, that reflectsthe nature of the real world. The system describes methods for takingadvantage of the fact that any given concept may be represented in ahundred different ways. Rather than selecting one record as“authoritative” like conventional methods, this system analyzes all theversions in a multi-step process to distill the raw material to a seriesof unique patterns representing human concepts optimized for machineanalysis. Each record is analyzed for its relative context in the sourcedocument, subject-matter templates, semantic trees and other sourcedocuments to build a statistical model representing the aggregation ofall perspectives on the topic. This process effectively produces asingle abstracted answer with the highest confidence level.

The system applies to both physical and virtual objects (e.g., a bookand a digital file) as well as ideas or language elements (e.g.,concepts, nouns, verbs, adjectives). Data records can be imported from,or exported to a multitude of static or linked file formats such as:text, MS Word, hypertext markup language (HTML), extensible markuplanguage (XML), and symbolic link (SYLK). Rather than attempting toforce the world into a single view, the system supports multipleperspectives/values for any given object and provides the consumer/userwith a way for dynamically setting filters based on variables such asinformation source, source authority, record robustness, timeliness,statistical performance, popularity, etc.

Example uses are as follows:

-   -   1. A global catalog of all manufactured products,        specifications, sources, terms of business, product text,        prices, vendors, etc.    -   2. A global knowledgebase of all counter-terrorism sensors,        surveillance, and warnings    -   3. A global knowledgebase containing all known life-forms and        their attributes: cells, genes, organisms, animals, etc.    -   4. A global knowledgebase containing all known body functions,        observation ranges, symptoms, attributes, therapies

Referring to FIG. 1B, a diagram of a context-rich data cluster orrepository 110 will be described. The context-rich data repository 110contains broad contextual information and has multiple-dimensions. Thedata repository 110 has unlimited associations and no-predefinedpurpose. The data repository 110 can be considered as a virtual entityhaving multiple records describing and referring to a same object orphenomenon so as to generate a complete definition with variousperspectives.

Referring to FIG. 2, the neutral nature of a context-rich datarepository 200 is illustrated where interoperability and data sharingtake precedence over form. Information can morph into different formsdepending upon the task at hand. For example, data associated with aphysical object, such as a medicine vial 210, can be embodied as aself-contained data object 220, a record in a relational database 230,or a web page 240.

Each record in the system data repository 200 can include the followingfeatures (entries are either selected from a predefined list of validterms or validated against a criteria):

-   -   1. A unique resource identifier code, such as a universally        unique identifier (UUID) or digital object identifier (DOI)    -   2. One or more fields that represent other coding equivalents        (e.g., Social Security number (SSN), international standard book        number (ISBN))    -   3. One or more fields that represent the authority of the        information source    -   4. Data origination specifications (device model, serial number,        range, etc.)    -   5. One or more fields that represent permissions (users, user        groups, operating systems, software, search spyders)    -   6. One or more fields that represent a language equivalents        (translations)    -   7. One or more fields that represent the local measurement        conversion of the data    -   8. One or more fields that represent the observable        characteristics and equivalents/observations (visual images: a)        front and b) rear, video, radar, IR, sound)    -   9. One or more fields that represent the physical parameters        (length, height, width, diameter, weight, etc.)    -   10. One or more fields that represent the substance/composition        makeup (ingredients, atomic elements, cell structure)    -   11. One or more fields that represent the relationship to other        things (master, sub-assembly, component)    -   12. One or more fields that represent the economic context        (suppliers, buyers, prices)    -   13. One or more fields that represent the social context        (environmental, safety, disclosure)    -   14. One or more fields that represent the political context        (laws, regulations)    -   15. One or more fields that represent the associated time, date,        and location of the event or phenomenon

Each field in the system data repository can include one or moreattributes that:

-   -   1. represent the primary name    -   2. are synonymous with primary name    -   3. represent the privacy level    -   4. represent the user rights (actions) and level    -   5. represent the access/security level    -   6. represent the qualitative rating level    -   7. represent the hierarchical inheritance tree position in a        taxonomy or ontology    -   8. represent the version of the data    -   9. represent the valid date range of the data    -   10. contain a dynamic link to external data    -   11. represent a digital signature that authenticates the data    -   12. represent the permission level for search engine spyder        access    -   13. represent the data entry person's identification    -   14. represent the data entry date, time and place

In the data repository 200, fields, records and clusters may bedelimited by symbols or advanced syntax like XML. The specificationshows a flattened, expanded record but implementation is preferred in arelational or linked structure. Each data entry preferably includesattributes clarifying information relevant to interoperability, e.g.,format, rate, measurement system, data type, etc.

Referring to FIG. 3, data elements 320 associated with the examplemedicine vial 210, shown in FIG. 2, will now be described. FIG. 3illustrates the contextual complexity of information on a typicalmedicine vial. Information regarding a universal product code (UPC)barcode, brand name, plant identifier, serial number, size, weight,product count, radio frequency identification (RFID) code, ingredients,dosage, instructions, and warning text can all be associated with themedicine vial 210. As long as the vial is in existence, all its dataelements are living links to multiple dynamic entities. Currently, thereis no efficient means for validating or researching most of theinformation represented on the vial. This system would effectivelyprovide a single resource capable of answering any consumer need relatedto the product, from ordering a refill to overdose intervention.

Referring to FIG. 4, an embodiment of an example context-rich datarepository system 400 will be described. The system 400 includes acontext-rich data repository 410 in communication with an inputmanagement subsystem 420 and a query management subsystem 430. The datarepository 410 further connects via a network, such as a wide-areanetwork, the Internet, or other types of networks, to lists, ontologiesand/or taxonomies 414. The input management subsystem 420 furtherconnects directly or by use of a web crawler 422 via the network 412 toauthors 424, devices 426, public databases 428 and private databases429. The query management subsystem 430 further connects directly or viathe network 412 to a user 432, a computing system 434 and a searchengine 436. Operation of the system 400 will be described herein below.

Referring to FIG. 5, an embodiment 500 of system software modules anddata resources used by the system 400 shown in FIG. 4 will be described.In certain embodiments, a server 510 can include the input managementsubsystem 420, the query management subsystem 430, and the context-richdata repository 410 shown in FIG. 4. A data entry module 535 associatedwith the server 510 is connected via the network 412 to a workstation515. External data sources 520 connect via the network 412 to animport/data parser module 540 associated with the server 510. Externalrequestors 525 connect via the network 412 to a data request evaluationmodule 560 associated with the server 510. A controller or processor 530in the server 510 operates on the data entry module 535, the import/dataparser module 540, the data request evaluation module 560, and also acontext-rich mapping module 545, a qualitative assessment module 550, astatistical analysis module 555, a resource interface module 565, and anaccounting: module 570. The resource interface module 565 is incommunication with and provides value lists 572, user profiles andrights 574, international information (e.g., language and measureequivalents) 576, templates 578, ontologies 580 and responses 582. Thefunctions and processing performed by these modules will be describedherein below.

FIG. 6 is an overview illustrating one embodiment of a system for datamanagement. The system may be implemented in any suitable software orhardware. In an exemplary embodiment, the system may be implemented inone or more software applications. The application may run on anysuitable general purpose single- or multi-chip microprocessor, or anysuitable special purpose microprocessor such as a digital signalprocessor, microcontroller, or a programmable gate array. Depending onthe embodiment, certain modules may be removed or merged together.

The system may comprise a parse unstructured data module 610 configuredto map unstructured data into data organized in a context-rich datastructure. The system may comprise a structured data entry module 620configured to receive data input from a source which is compatible witha context-rich data structure. The source could be, for example, a user.

The system may comprise a subject template resource module 630configured to receive data organized from the parse unstructured datamodule 610 and the structured data entry module 620. The subjecttemplate resource module 630 is configured to look up each element ofthe object in one or more topic structures such as ontologies,taxonomies, or lists related to the selected template. The upper andlower tree elements for each word are retrieved and stored as areference to the word along with an identification of what resource isused for the match. A contextualization process is run to calculateresults for the newly acquired subject terms. Each calculated contextvalue resulting from the foregoing processes is then compared to eachfield of the template in order to determine the probability of a matchwith previously analyzed objects of the same meaning.

The system may comprise a qualitative assessment module 650 configuredto receive input from the subject template resource module 630 and thenlook up the recorded attributes in a table for each category in order tocalculate a qualitative score for such things as sources, authors orother deterministic variables.

The system may comprise a context-rich mapping module 660 configured toreceive input from the qualitative assessment module 650 and then addfurther information into the structured data.

The system may comprise a universal knowledge repository 670 configuredto store data in the form of context-rich data clusters that have alogical structure that facilitates efficient access by any one of manydata retrieval engines such as: data mining tools, relational databases,multi-dimensional databases, etc. The universal knowledge repository 670may be configured to receive structured data from the context-richmapping module 660. The data stored in the universal knowledgerepository 670 may be accessed by other modules. The universal knowledgerepository 670 may be any suitable software or hardware for datastorage. In one embodiment, the universal knowledge repository 670 is adatabase.

The system may comprise a search module 690 configured to access theuniversal knowledge repository 670 for a search directly or via anetwork. The search module 690 may be any tools or programs suitable forreturning a set of search results by searching a database based on auser query request, including search engines provided by Google Inc.,Microsoft or Verity.

The system may comprise a filter module 680 in connection with theuniversal knowledge repository 670 and the search module 690, allconnected directly or via a network. The filter module 680 is configuredto interpret user preferences from a user message that indicates how tomanage ambiguous data sets, and then provide the query syntax containingrelevant variables for the search module 690.

The system may comprise an interaction module 692 in communication withthe universal knowledge repository 670 and the search module 690, allconnected directly or via a network. The interaction module 692 isconfigured to interact with a user to narrow a search result returned bythe search module 690.

The system may comprise a data request evaluation module 694 configuredto receive a user query and return a final search result to the user.The data request evaluation module 694 is configured to manage requestsfor access to and delivery of data stored in the universal knowledgerepository 670 by communicating to the filter module 680 and/or thesearch module 690.

The system may comprise an accounting module 640 configured tocommunicate with other modules in the system to track user activitiessuch as data submission and for administrative purposes such as billing,credit, or reporting. The accounting module may be useful for, forexample, applications where searching or data entry is a service for feeor where detailed user auditing is required.

FIG. 7 is an example illustrating the process of mapping unstructureddata into a context-rich data structure. In the example, theunstructured data comes from a webpage including paragraphs, sentences,and source author. Many other instances or sources of the samephenomenon may be later incorporated in the parallel structure. Theunstructured data is converted into a context-rich data structure byapplying a series of processes with the help of at least one or more ofthe following: subject-specific template, language index, Webster'staxonomy, IEEE ontology, semantic web ontology, and a set of lists.

FIG. 8 is an example of the subject-specific template. The template isdivided into a set of field groups including, but not limited to,identification, subject, physical, observable, economic, and social.Each field includes a list of associated items describing the source orusage context of the field.

FIG. 9 is an example illustrating the process of calculating aqualitative score for each entity based on all records associated withthat entity. An entity can be anything that may influence the data, suchas: source organization, source person, source device, etc. The tablerecords the score of each past transaction indicating, for example, thecompleteness, integrity, and the popularity. Each transaction may have aunique record identification number and associated with one entityidentification. The bottom of the scorecard summarizes the total scoreassociated with a particular entity based on the score of alltransactions associated with that entity.

FIG. 10 is an example illustrating the process of calculating aqualitative score for each record based on all parameters included inthat record. In the example, each record includes parameters A, B, C, .. . , G. A score is given in the table for each parameter. By simplyadding the score for each parameter within, a record, the score of therecord is determined. It will be appreciated that other mathematicalapproaches may be taken to determine the score of a record based onscores for parameters of that record.

FIG. 11 is an example illustrating the process of scoring each entrywithin a record. Each record includes a set of entries such as SOURCE,SYSTEM, DATA, or AUTHOR. The qualitative attributes such as A1-A3associated with each entry is read and a score is calculated for eachentry.

FIG. 12 is an example illustrating a table which may be used in theprocess of mapping unstructured data into a context-rich data structure.Such a process will be described in further detail with regard to FIG.19. The mapping table includes entries including Primary Source,Secondary Sources, Parallel Syntax, and Template Reference. Each entrymay include multiple elements each of which has a correspondingconfidence score (e.g., the qualitative score).

FIG. 13 is an example of a user query message. The query messageincludes, for example, requestor identification code used to link theuser to his personal profile and accounting log, template identificationcode, response identification code, query expression, source qualityrange or other quality preference variables, and render levelpreference.

FIG. 14 is an example illustrating the process of retrieving data basedupon user-designated quality parameters. In the example, user Arestricts searches in his query to only the highest level ofclassification authority whose credentials are the highest scoring.However, user B is more interested in a broader scope and specifies inhis query that a median value for all sources is preferred.

FIG. 15 is a flowchart of one embodiment of a method of formattingunstructured data. The exemplary method may be performed, for example,by the parse unstructured data module as described in FIG. 6. Dependingon the embodiment, certain steps of the method may be removed, mergedtogether, or rearranged in order.

In the exemplary method, a data stream is received and then parsed intoone or more grammatical elements (e.g. word, sentence, paragraph, etc.),assigned a unique identification code and attributes as to what otherelements it is a member of (e.g. a word is a member of a sentence, aparagraph and a page). Depending upon the origin of the data, other keyfield may be extracted such as: source, date, author, descriptive markuplanguage (e.g. XML, HTML, SGML), etc. Each element is stored in asuitable memory structure together with associated links.

The method starts at a block 1510, where an input data string or streamis parsed into one or more grammatical objects (e.g. word, sentence,paragraph, etc.). The input data string may be received, for example,from an external web crawler, a data stream such as RSS, or aconventional file transfer protocol. A reference to the location of eachgrammatical object within the data string is stored. A uniqueidentification number is assigned to each object and stored in memory.

Next at a block 1520, words within each grammatical object are looked upin the index to determine equivalents to one or more words. Theequivalent may include, for example, synonym in the same language or aword or word group in a foreign language having equivalent meaning. Inone example, English is used as the system reference language and thedata being parsed is French. Each word is looked up in all foreignlanguage indexes to determine the best translation and then a pointer toeach language equivalent word is stored. In one embodiment, numbers,measures, and all other non-Grammatik objects are converted using therespective cultural indexes.

Moving to a block 1530, each word is statistically analyzed to determinea probability score indicating how close the word is related to eachsubject matter field. A series of statistical routines are applied toeach element parsed in the previous processes in order to calculateresults that uniquely reflect the word positions and relativeassociations within the object. This result is then compared to anarchive of subject matter specific templates that contain similarcontext-values of other known objects that belong to the subject fieldrepresented by the template. If the match results in a probabilityresult over a certain threshold, a link is established in both therecord of the object and in the template. The pointer to each word isthen stored in at least one subject matter index.

Next at a block 1540, a value is stored for each attribute within eachobject. The attributes of each object may include, e.g., source, author,date, time, location, version, security level, and user permissions. Inone embodiment, not all attributes within an object have a value stored.One or more attributes may be left unassigned.

Referring to FIG. 16, a flowchart of an embodiment of a processperformed by the structured data entry module 620 shown in FIG. 6 willbe described. FIGS. 6, 8, 9, 10, 11 and 12 are also referred to in thediscussion of the structured data entry module 620.

Process 620 begins at a start state and moves to state 1610 where a userlogs on to the system (e.g., server 510, FIG. 5), registers a userprofile and provides credentials if applicable. The qualitative processdescribed below depends in part upon the quality of the credentialspresented or recorded into the profile with regard to the source entity,data generation device or software, author, etc., and can include:

-   -   a. Measurement equipment brand, model, serial, specifications    -   b. Transaction data    -   c. Software package, version, routine    -   d. Information retrieved from an external source file/system,        e.g., prices, inventory levels, sensor values.

Proceeding to state 1615, process 620 starts an accounting log. In someapplications, data entry is a service for fee or where detailed userauditing is required. The Accounting module 640 (FIG. 6) is responsiblefor tracking user activity and data submissions for later billing,credit or reporting. Continuing at state 1620, based upon the profileand credentials presented above, an access control subsystem sets userlevel, rights and permissions that regulate what type of data may beentered or edited, what can be seen and what related activities areprovided. Advancing to state 1625, process 620 looks up the useridentification (ID) in the index to determine what actions are allowed.Based upon the profile, the system presents the user with a list ofsubject areas he or she is permitted to enter.

Proceeding to state 1630, process 620 generates a new record orcopies/inherits from an existing record and, at state 1635, assigns aunique identification code to the record. Continuing at state 1640,process 620 stores, in certain embodiments, the user ID, date, time andlocation for each data modification.

Advancing to state 1645, process 620 calls upon the Subject Template andResource module 630 (FIG. 6) to present the user with a data entry formderived from the subject specific template (e.g., FIG. 8) selectedabove. Each field of the form may potentially discipline the user inputto valid data ranges or other performance standards. The processcontinues by looking up each element of data entry in one or more topicstructures such as ontologies, taxonomies or lists related to theselected template. The upper and lower tree elements for each word areretrieved and stored as a reference to the word along with anidentification of what resource was used for the match. AContextualization process then calculates results for the newly acquiredsubject terms. Each calculated context-value resulting from theforegoing processes is then compared to each field of the template inorder to determine the probability of a match with previously analyzedobjects of the same meaning. Templates are designed to capture acontext-rich data structure and include such resources as:

-   -   source identity, location, file format and size    -   Higher, lower and equivalent subject classifications together        with a reference to source    -   Observable records (e.g., photos, videos, radar signature, IR,        temperature, etc.)    -   Physical measures (e.g., height, weight, depth, latitude, atomic        mass, chemical structure, etc.)    -   Economic variables (e.g., price, stock, manufacturer names,        retailer names).

At the completion of state 1645, process 620 moves to state 1650 wherean Internationalization process loads each word entered in the form,looks it up in all foreign language indexes to determine the besttranslation and then stores a pointer to each language equivalent word.Numbers, measures, and all other non-grammatic objects are convertedusing the respective cultural indexes.

Proceeding to state 1655, processing moves to the Qualitative Assessmentmodule 650 (FIG. 6) where the recorded attributes are looked up in atable for each category in order to calculate a qualitative score. Seealso FIGS. 9-11. In an example, the source of the information in theobject comes front a French government agency with a high scoreindicating that the source has been authenticated, the data is valid andprevious experience has been of high quality. The score is returned forassociation with the element and the source table is updated.

Proceeding to state 1660, processing continues at the Context-richMapping module 660 (FIG. 6) that loads each word for every field in thecurrent form being processed into the primary source record of a mappingtable along with the associated qualitative scores. See also FIG. 12.This repeats for each subsequent word and/or element contained in theform. The module then proceeds to load contents into secondary sourcerecords if they exist at this time. Otherwise the module loads contentsinto parallel syntax records corresponding to each source field andorganized by the relevant hierarchical location in the subject tree. Forexample, the first primary source word would be looked up in the firsttopic tree, retrieve the next highest term (along with the tree ID andquality attribute) and place it in the first parallel syntax record in aposition associated with the first primary source word. Lastly, themodule loads the calculated results from template analysis in thecorresponding fields for each primary source element. At the completionof state 1660, process 620 ends at an end state.

FIG. 17 is a flowchart of one embodiment of a method of linking a dataobject to one or more templates. The exemplary method may be performed,for example, by the subject template and resources module as describedin FIG. 6. Depending on the embodiment, certain steps of the method maybe removed, merged together, or rearranged in order.

The method looks up each element of the parsed object in one or moretopic structures such as ontologies, taxonomies or lists related to theselected template. An example of the template is illustrated in FIG. 8.The upper and lower tree elements for each word are retrieved and storedas a reference to the word along with an identification of what resourcewas used for the match. A contextualization process is run to calculateresults for the newly acquired subject terms. Each calculatedcontext-value resulting from the foregoing processes is then compared toeach field of the template in order to determine the probability of amatch with previously analyzed objects of the same meaning. In oneembodiment, templates are designed to capture a context-rich datastructure to include such resources as: a) source identity, location,file format and size; b) higher, lower and equivalent subjectclassifications together with a reference to source; c) observablerecords (e.g. photos, videos, radar signature, IR, temperature, etc.);d) physical measures (e.g. height, weight, depth, latitude, atomic mass,chemical structure etc.); and e) economic variables (e.g. price, stock,manufacturer names, retailer names).

The method starts at a block 1710, where a first object is loaded. Nextat a block 1720, each object is looked up in template index to identifyone or more related templates. Moving to a block 1730, a reference linkis established between an object and each related template. Next at ablock 1740, the results from block 1730 are compared to values stored inthe index of subject templates.

If a match is found at a block 1750, the method moves to a block 1760.At the block 1760, the associates, are stored with both the template andthe object before the method moves next to block 1770. If no match isfound at block 1750, the method jumps to block 1770.

Next at block 1770, statistical analysis of each object in relation toanother object is performed. The analysis result is then stored in theindex with pointers to the objects. As described earlier, each objectcould be, e.g., a word, a sentence, a paragraph, an article, and so on.

Moving to a block 1780, mathematical analysis is performed of otherobjects in the associated template. Again, the analysis result is storedin the index with pointers to the objects. In one embodiment, pointersto templates are stored as parallel syntax for each object.

Next at a block 1790, a word is looked up from the container in asemantic tree to retrieve upper and lower tree elements for storage orassociation with the word. The semantic tree could be, for example,several different ontologies or taxonomies.

FIG. 18 is a flowchart of one embodiment of a method of evaluating thequality of a data object. The exemplary method may be performed, forexample, by the qualitative assessment module as described in FIG. 6.Depending on the embodiment, certain steps of the method may be removed,merged together, or rearranged in order. The method of evaluating thequality of a data object is further illustrated earlier in FIGS. 9-11.

In the method, the attributes recorded in the object are looked up in atable for each category in order to calculate a qualitative score. Inone example, the source of the information in the object comes from aFrench government agency with a high score indicating that the sourcehas been authenticated, the data is valid, and has been of high qualityin past experience. The score is returned for association with theelement and the source table is updated.

The method starts a block 1820, where an object is loaded. Moving to ablock 1830, the object is looked up in associated table. The table couldbe, for example, SOURCE, SYSTEM, DATA, or AUTHOR. Next at a block 1840,attributes associated with the object is read. In one embodiment, amathematical function is applied to generate a score. In anotherembodiment, a stored score is retrieved.

Last at a block 1850, the score/value associated with the object isreturned. In one embodiment, the score/value indicates the quality ofthe data object.

FIG. 19 is a flowchart of one embodiment of a method of addingcontextual data to a data object. The exemplary method may be performed,for example, by the context-rich mapping module as described in FIG. 6.Depending on the embodiment, certain steps of the method may be removed,merged together, or rearranged in order.

In this method, the first word of the first element of the currentobject is first loaded into the Primary Source record of a mappingtable, such as the mapping table illustrated in FIG. 12, along with theassociated qualitative scores. The same process is repeated for eachsubsequent word and/or element contained in the object. The foregoingfields are also loaded into Secondary Source records if such data existsat this time. Otherwise these fields are loaded into Parallel Syntaxrecords corresponding to each Source field and organized by the relevanthierarchical location in the subject tree. Lastly, the calculatedresults from Template analysis are loaded in the corresponding fieldsfor each Primary Source element.

The method starts at a block 1910, where, for each object, source valuesof the object and of parallel syntaxes are analyzed to generate uniquestring values, which are indicative of a narrow meaning or concept. Eachparallel syntax expresses a different level of abstraction from theprimary source. In one example, it may be possible to have severalsource documents that appear to present conflicting information butfollowing the process described here, the second highest parallel syntaxmight reveal that all the documents merely express the same idea indifferent terms. The concept strings are then stored in the index.

Moving to a block 1920, a parallel syntax record is created for eachobject where each source word has parallel field entry.

Moving to a block 1930, context-rich data clusters are created asconcatenated strings of delimited values or references that groupcontextual data related to the object into logical structures. Thelogical structures may be, for example, a field, a record, a document,URLs, a file or a stream.

Next at a block 1940, the context-rich data cluster may optionally beexported to a data exchange format (e.g., XML, HTML, RTF, DBF, PDF). Inthe data exchange format, reference links are retrieved together withpage layout/rendering attributes.

FIG. 20 is a flowchart of one embodiment of a method of searching adatabase based on a user quest. The exemplary method may be performed,for example, by the data request evaluation module as described in FIG.6. Depending on the embodiment, certain steps of the method may beremoved, merged together, or rearranged in order.

The method starts at a block 2010, where a user query message isreceived and loaded. The user query message may be in various formats.In the exemplary embodiment, the message is in the format of the requestmessage illustrated in FIG. 13.

Moving to a block 2020, message fields are parsed according to themessage format and stored in a registry.

Next at a block 2030, each parsed field is loaded and a lookup isperformed m the profile database. In the profile database, a requestoridentification number is linked to an authorized person or entity. Basedon the requestor identification number provided in the user querymessage, parameters related to a particular use may be determined, suchas security level, render level preference, permissions, and billingrate. The render level preference indicates the depth or robustness ofinformation desired for the application. Depending upon the userpermissions and render level preference, a query reply may containseveral words or several terabytes.

In another embodiment, the parsed message may further comprise one ormore of the following: a template identification code, a responseidentification, a query expression, a set of source quality parameters.The template identification code indicates if the search query is to bedirected to a specific element of a subject-matter specific template.The response identification refers to specific information needed tosend a reply to the requesting party. The query expression may be anyone of many query syntaxes in use such as SQL. The set of source qualityparameters indicates how the raw data will be filtered prior toconducting the query so as to conform with the requesting party'squalitative restrictions. The qualitative restrictions may pertain to,for example, source, author, data, system, etc.

One example is illustrated earlier in FIG. 14. In that example, one usermight restrict searches to only the highest level of classificationauthority whose credentials are the highest scoring while another usermight be more interested in a broader scope and request that a medianvalue for all sources is preferred.

Moving to a block 2040, the template identification code is looked up ifit is provided in the user query message. A template corresponding tothe template identification code is then retrieved.

Next at a block 2050, certain user variables (e.g., source quality) areloaded into the query filter.

Moving to a block 2060, the user query message and filter variables aresent to a search module such as the search module 690 as describedearlier in FIG. 6. The search module may be any tools or programssuitable for returning a set of search results by searching a databasebased on a user query request, including the search engine provided byGoogle Inc.

Next at a block 2070, search results are received from the searchmodule. In one embodiment, a determination is made as to whether thereceived search results reflect two or more plausible search pathsmatching different subject templates.

Moving to a block 2080, the user query and its results are logged intothe requestor's profile and the template profile for tracking.

Next at a block 2090, the results are forwarded to the user inaccordance with the user's preferences. In some cases, additional userinput is required in order to further refine the search result to asingle subject area.

FIG. 21 is a flowchart of one embodiment of a method of generating querysyntax based on a user query message. The exemplary method may beperformed, for example, by the filter module as described in FIG. 6.Depending on the embodiment, certain steps of the method may be removed,merged together, or rearranged in order. In one embodiment, the querysyntax is then sent to a search module as input for a search operation.

The method starts at a block 2110, where one or more user filterparameters are received from a parsed user query message. Next at ablock 2120, a table may be created for storing each filter parameter.Moving to a block 2130, each filter parameter value is looked up in theparameter index to determine proper query syntax in order to achieve thedesired search result. Next at a block. 2140, the syntax message ispassed to the search module as described in FIG. 6.

FIG. 22 is a flowchart of one embodiment of a method of interacting witha user to narrow a search result. The exemplary method may be performed,for example, by the interaction module as described in FIG. 6. Dependingon the embodiment, certain steps of the method may be removed, mergedtogether, or rearranged in order.

The method starts at a block 2210, where a query reply including asearch result is received from a search module (e.g., the search module690 in FIG. 6). Next at a block 2220, the result is parsed to determinethe number and quality of the subject trees.

Moving to a block 2230, it is determined whether there are less than twosubject trees. If the answer at block 2230 is yes, the method moves to ablock 2280. If the answer at block 2230 is no, the method then moves toa block 2240.

Next at block 2240, the user profiler associated with the user issuingthis query and templates are evaluated to determine which subject treeshave the highest probability of satisfying the user query. Moving to ablock 2250, a message is sent to the user requesting him to select oneof the trees representing a path, for example, which is most likely tonarrow the search result. Next at a block 2260, a reply is received fromthe user and a new search query is generated based on the current userquery and the user reply. Moving to a block 2270, the new query is sentto the search module and the method moves back to block 2220.

At block 2280, it is determined whether the matches in the search resultexceed a user-defined preference. If the answer at block 2280 is yes,the method goes back to block 2220. Otherwise, the method moves to ablock 2290.

At block 2290, a final search result is forwarded to the user. The finalsearch result may be narrower than the original search result includedin the query reply from the search module.

Next at a block 2292, information related to the current search isstored in user profile and subject matter profile for future reference.

APPLICATION EXAMPLE

One example embodiment of the system would be the creation of acomprehensive, industry-wide database with full transparency. One of themain inhibitors to free trade is the imperfect availability of real-timecommerce information. Referring to the domestic grocery industry as anexample, many participants up and down the value chain from consumer tomanufacturer make daily decisions with impartial information. A consumerwho wants to purchase a list of products at the lowest possible pricecan rarely afford to comparison shop every product on the list. Theindustry exploits this fact by running promotions on certain products todraw a consumer in the door on the expectation of making up thedifference with higher margins on other products.

All parties stand to gain if this system were implemented. To illustratethe point, this discussion focuses on the consumer-retailer benefits,but it will become apparent that it is equally applicable to other valuechain participants as well. A service provider operating the system 400(FIG. 4) would contact all retailers in a given region and notify themof the opportunity to make their product information directly accessibleto the public in order to facilitate their shopping needs. Fiveretailers decide to participate and arrange to have their proprietarydatabases 429 connect via a network to the input management system 420.As each provider has previously registered with the service provider,they connect to the structured data entry module 620 (FIG. 6) where thesystem authenticates the connection and logs them into the accountingmodule 640 where they may be assessed a fee for using the system. Aseach retailer has its own proprietary system, the subject templatemodule 630 retrieves both a template for each retailer and a templatefor each product group contained in the retailer's database. Productsmay be stored with different formats in different fields and spelled ordescribed in different ways. Moreover, stock level and pricing may beexpressed in incompatible ways. The retailer specific template directsthe system into how to parse the proprietary information, and a producttemplate similar to FIG. 8 aids the system in classification and linkageto contextual records, like manufacturer resources (recipes, photospromotions), government warnings or recent news articles about theproduct. The qualitative assessment module 650 evaluates each data entryfor quality and records scores for the retailer, its computer system andsoftware and the operator in charge, as outlined in FIGS. 9, 10 and 11.Finally, the context-rich mapping module 660 maps each data elementusing a method similar to that shown in FIG. 12 where each primaryelement is assigned one or more parallel records that reflect duplicateinformation or semantic equivalents as determined from externalresources similar to that shown in FIG. 7.

The end result of the above process is a real-time knowledge repositorythat provides consumers with near perfect insight into what products areavailable on the best terms from a retailer. Moreover, the context-richmapping and subject specific template has created associations for eachproduct record so that consumers can instantly see which retailer is thebest for their specific shopping list taking in consideration specialoffers from related parties, available stock, and other incentives.

A consumer might access the system 400 shown in FIG. 4 via a publicsearch engine 436 like Google. An example search for ‘peanut butter’(see FIG. 13) might take the consumer to the data request evaluationmodule 694 (FIG. 6) where a cookie in her browser would trigger her useraccount 640 and her preference for restricting searches tohigh-authority data sources within her zip of 92109 similar to thatshown in FIG. 14. The user preferences would be forwarded to the filtermodule 680, combined with the search query and passed to the searchmodule 690. The resulting matches would contain more than one categoryand pass the it module 692 a user question to choose from ‘peanut Tread’or ‘peanut butter cookies’ or ‘peanut butter ice cream’. The user'sresponse would then be forwarded back to the search module and the finalresult returned to the user.

CONCLUSION

The foregoing description details certain embodiments of the invention.It will be appreciated, however, that no matter how detailed theforegoing appears in text, the invention may be practiced in many ways.It should be noted that the use of particular terminology whendescribing certain features or aspects of the invention should not betaken to imply that the terminology is being re-defined herein to berestricted to including any specific characteristics of the features oraspects of the invention with which that terminology is associated.

While the above detailed description has shown, described, and pointedout novel features of the invention as applied to various embodiments,it will be understood that various omissions, substitutions, and changesin the form and details of the device or process illustrated may be madeby those skilled in the technology without departing from the spirit ofthe invention. The scope of the invention is indicated by the appendedclaims rather than by the foregoing description. All changes which comewithin the meaning and range of equivalency of the claims are to beembraced within their scope.

What is claimed is:
 1. A computer-implemented method of collecting datainput to a multi-contextual and multi-dimensional database, the methodcomprising: receiving input data from a first user or first devicelogged on to a computing system with valid credentials; providing atemplate that substantially most closely matches the input dataincluding subject matter specific settings; generating a new recordbased on the template; and assigning a unique identification code to thenew record.